AITopics | out-of-sample performance

Collaborating Authors

out-of-sample performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is Cross-validation the Gold Standard to Estimate Out-of-sample Model Performance?

Neural Information Processing SystemsMar-22-2026, 01:02:29 GMT

Cross-Validation (CV) is the default choice for estimate the out-of-sample performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that, in terms of estimating the out-of-sample performances, for a wide spectrum of models, CV does not statistically outperform the simple ``plug-in'' approach where one reuses training data for testing evaluation. Specifically, in terms of both the asymptotic bias and coverage accuracy of the associated interval for out-of-sample evaluation, $K$-fold CV provably cannot outperform plug-in regardless of the rate at which the parametric or nonparametric models converge. Leave-one-out CV can have a smaller bias as compared to plug-in; however, this bias improvement is negligible compared to the variability of the evaluation, and in some important cases leave-one-out again does not outperform plug-in once this variability is taken into account. We obtain our theoretical comparisons via a novel higher-order Taylor analysis that dissects the limit theorems of testing evaluations, which applies to model classes that are not amenable to previously known sufficient conditions. Our numerical results demonstrate that plug-in performs indeed no worse than CV in estimating model performance across a wide range of examples.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

Miller, Elizabeth W., Blume, Jeffrey D.

arXiv.org Machine LearningMar-3-2026

In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics: empirical prediction interval width (ePIW), which captures variability in continuous risk estimates, and empirical decision flip rate (eDFR), which measures instability in threshold-based clinical decisions. We apply these diagnostics to simulated data and GUSTO-I clinical dataset. Across observed settings, we find that for flexible machine-learning models, randomness arising solely from optimization and initialization can induce individual-level variability comparable to that produced by resampling the entire training dataset. Neural networks exhibit substantially greater instability in individual risk predictions compared to logistic regression models. Risk estimate instability near clinically relevant decision thresholds can alter treatment recommendations. These findings that stability diagnostics should be incorporated into routine model validation for assessing clinical reliability.

artificial intelligence, instability, machine learning, (16 more...)

arXiv.org Machine Learning

2603.00192

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
Asia > Middle East > Saudi Arabia (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

402542c2341e5d2eadc1dd0891275901-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 19:37:08 GMT

decision support system, experiment, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Arizona (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Decision Support Systems (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Wasserstein Distributionally Robust Nash Equilibrium Seeking with Heterogeneous Data: A Lagrangian Approach

Wang, Zifan, Pantazis, Georgios, Grammatico, Sergio, Zavlanos, Michael M., Johansson, Karl H.

arXiv.org Artificial IntelligenceDec-8-2025

We study a class of distributionally robust games where agents are allowed to heterogeneously choose their risk aversion with respect to distributional shifts of the uncertainty. In our formulation, heterogeneous Wasserstein ball constraints on each distribution are enforced through a penalty function leveraging a Lagrangian formulation. We then formulate the distributionally robust game as a variational inequality problem, and show that under certain assumptions the original seemingly infinite-dimensional Nash equilibrium problem is equivalent to a multi-agent but finite-dimensional variational inequality problem with a strongly monotone mapping. Due to the inner maximization problem, it is however still challenging to calculate a distributionally robust Nash equilibrium. To this end, we design an approximate Nash equilibrium seeking algorithm and prove convergence of the average regret to a quantity that diminishes with the number of iterations, thus learning the desired equilibrium up to an a priori specified accuracy. Numerical simulations corroborate our theoretical findings.

artificial intelligence, equilibrium, game theory, (15 more...)

arXiv.org Artificial Intelligence

2511.14048

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Optimization and Regularization Under Arbitrary Objectives

Lakhani, Jared N., Pienaar, Etienne

arXiv.org Machine LearningNov-26-2025

This study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, focusing on a two-block MCMC framework which alternates between Metropolis-Hastings and Gibbs sampling. While such approaches are often considered advantageous for enabling data-driven regularization, we show that their performance critically depends on the sharpness of the employed likelihood form. By introducing a sharpness parameter and exploring alternative likelihood formulations proportional to the target objective function, we demonstrate how likelihood curvature governs both in-sample performance and the degree of regularization inferred by the training data. Empirical applications are conducted on reinforcement learning tasks: including a navigation problem and the game of tic-tac-toe. The study concludes with a separate analysis examining the implications of extreme likelihood sharpness on arbitrary objective functions stemming from the classic game of blackjack, where the first block of the two-block MCMC framework is replaced with an iterative optimization step. The resulting hybrid approach achieves performance nearly identical to the original MCMC framework, indicating that excessive likelihood sharpness effectively collapses posterior mass onto a single dominant mode.

likelihood, posterior, regularization, (15 more...)

arXiv.org Machine Learning

2511.19628

Country: Africa > South Africa > Western Cape > Cape Town (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Tic-Tac-Toe (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Add feedback

On Optimal Generalizability in Parametric Learning

Ahmad Beirami, Meisam Razaviyayn, Shahin Shahrampour, Vahid Tarokh

Neural Information Processing SystemsNov-21-2025, 12:14:06 GMT

We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer.

artificial intelligence, cross validation vector, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.80)

Add feedback

Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization

Neural Information Processing SystemsOct-10-2025, 00:22:04 GMT

Training machine learning and statistical models often involves optimizing a data-driven risk criterion.

criterion, experiment, procedure, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Arizona (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Predictive economics: Rethinking economic methodology with machine learning

Pereira, Miguel Alves

arXiv.org Artificial IntelligenceOct-7-2025

This article proposes predictive economics as a distinct analytical perspective within economics, grounded in machine learning and centred on predictive accuracy rather than causal identification. Drawing on the instrumentalist tradition (Friedman), the explanation-prediction divide (Shmueli), and the contrast between modelling cultures (Breiman), we formalise prediction as a valid epistemological and methodological objective. Reviewing recent applications across economic subfields, we show how predictive models contribute to empirical analysis, particularly in complex or data-rich contexts. This perspective complements existing approaches and supports a more pluralistic methodology - one that values out-of-sample performance alongside interpretability and theoretical structure. Keywords: Predictive economics, Machine learning, Forecasting, Causal inference, Economic methodology 1. Introduction The evolution of economics has long been shaped by advances in analytical tools.

artificial intelligence, economics, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.04726

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Banking & Finance > Economy (1.00)
Energy (0.69)
Government > Regional Government > North America Government > United States Government (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.36)

Add feedback

Mean-Field Generalisation Bounds for Learning Controls in Stochastic Environments

Baros, Boris, Cohen, Samuel N., Reisinger, Christoph

arXiv.org Machine LearningAug-25-2025

When solving stochastic control problems, one is often limited by the challenge of specifying realistic model dynamics of the involved processes. Parametric approaches to estimating dynamics introduce model error, while'model-free' approaches typically suffer from extreme curse of dimensionality constraints. The development of reliable machine-learning based methods for stochastic control is therefore of significant practical interest. In this paper, we focus on problems where a decision maker faces a stochastic environment, that is, where they interact with a system with unknown and uncontrolled stochastic dynamics, which, together with their control, induce a controlled state process and costs. Examples of this include optimal investment for a small investor - here the stochastic dynamics of assets are uncontrolled and unknown, the investor chooses a strategy based on past observations, and together these generate a wealth process which must be optimised. A second example is aerial navigation in the presence of uncertain weather - the weather is unaffected by the navigation policy chosen, while the navigator must account for uncertainties in their planning, and the resulting flight-plan needs to be optimised. In both these cases, the stochastic environment is naturally high-dimensional and may not be Markovian, and so is challenging to model statistically using finitely many observations. We consider the setting where we have access to a finite number of i.i.d.

artificial intelligence, generalisation error, machine learning, (15 more...)

arXiv.org Machine Learning

2508.16001

Country: